Search results for "random forests"
showing 10 items of 10 documents
A Methodology to Derive Global Maps of Leaf Traits Using Remote Sensing and Climate Data
2018
This paper introduces a modular processing chain to derive global high-resolution maps of leaf traits. In particular, we present global maps at 500 m resolution of specific leaf area, leaf dry matter content, leaf nitrogen and phosphorus content per dry mass, and leaf nitrogen/phosphorus ratio. The processing chain exploits machine learning techniques along with optical remote sensing data (MODIS/Landsat) and climate data for gap filling and up-scaling of in-situ measured leaf traits. The chain first uses random forests regression with surrogates to fill gaps in the database (> 45% of missing entries) and maximizes the global representativeness of the trait dataset. Plant species are then a…
Application of selected methods of black box for modelling the settleability process in wastewater treatment plant
2017
The paper described how the results of measurement s of inflow wastewater temperature in the chamber, a degree of external and internal recirculation in the biological-mechanical wastewater treatment plan t (WWTP) in Cedzyna near Kielce, Poland, were used to make predictions of settleability of activated sludge. Three methods,namely: multivariate adaptive regression splines (MARS), random forests (RF) and modified random forests (RF+ SOM) were employed to compute activated sludge settleability. The results of analysis indicate that modified random forests demonstrate the best predictive abilities.
Démarche statistique pour la sélection des indicateurs par Random Forests pour la surveillance de la qualité des sols
2013
The volume of data, and the large number of biological variables to be tested (one hundred), require analytical techniques, such asRandom Forests, which can overcome the problem of multi-colinearity for the selection of indicators, sensitive to various factors.Random Forests methodology is appropriate for the selection of the most discriminant variables. So, we searched for the best wayto select them, by bringing together all biological variables, representing the Microflora and Fauna. This approach focuses on impactindicators from the Bio2 program, indicators of flora and indicators of accumulation (snails) were not included.This work has been implemented on the three factors of discrimina…
Global Estimation of Biophysical Variables from Google Earth Engine Platform
2018
This paper proposes a processing chain for the derivation of global Leaf Area Index (LAI), Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), Fraction Vegetation Cover (FVC), and Canopy water content (CWC) maps from 15-years of MODIS data exploiting the capabilities of the Google Earth Engine (GEE) cloud platform. The retrieval chain is based on a hybrid method inverting the PROSAIL radiative transfer model (RTM) with Random forests (RF) regression. A major feature of this work is the implementation of a retrieval chain exploiting the GEE capabilities using global and climate data records (CDR) of both MODIS surface reflectance and LAI/FAPAR datasets allowing the global estim…
Application of selected supervised classification methods to bank marketing campaign
2016
Supervised classification covers a number of data mining methods based on training data. These methods have been successfully applied to solve multi-criteria complex classification problems in many domains, including economical issues. In this paper we discuss features of some supervised classification methods based on decision trees and apply them to the direct marketing campaigns data of a Portuguese banking institution. We discuss and compare the following classification methods: decision trees, bagging, boosting, and random forests. A classification problem in our approach is defined in a scenario where a bank’s clients make decisions about the activation of their deposits. The obtained…
Classification of Melanoma Lesions Using Sparse Coded Features and Random Forests
2016
International audience; Malignant melanoma is the most dangerous type of skin cancer, yet it is the most treatable kind of cancer, conditioned by its early diagnosis which is a challenging task for clinicians and dermatologists. In this regard, CAD systems based on machine learning and image processing techniques are developed to differentiate melanoma lesions from benign and dysplastic nevi using dermoscopic images. Generally, these frameworks are composed of sequential processes: pre-processing, segmentation, and classification. This architecture faces mainly two challenges: (i) each process is complex with the need to tune a set of parameters, and is specific to a given dataset; (ii) the…
Growing stock volume from multi-temporal landsat imagery through google earth engine
2019
Growing stock volume (GSV) is one of the most important variables for.forest management and is traditionally- estimated from ground measurements. These measurements are expensive and therefore sparse and hard to maintain in time on a regular basis. Remote sensing data combined with national forest inventories constitute a helpful tool to estimate and map forest attributes. However, most studies on GSV estimation from remote sensing data focus on small forest areas with a single or only a few species. The current study aims to map GSV in peninsular Spain, a rather large and very heterogeneous area. Around 50 000 wooded land plots from the Third Spanish National Forest Inventory (NFI3) were u…
A Methodological Framework to Discover Pharmacogenomic Interactions Based on Random Forests
2021
The identification of genomic alterations in tumor tissues, including somatic mutations, deletions, and gene amplifications, produces large amounts of data, which can be correlated with a diversity of therapeutic responses. We aimed to provide a methodological framework to discover pharmacogenomic interactions based on Random Forests. We matched two databases from the Cancer Cell Line Encyclopaedia (CCLE) project, and the Genomics of Drug Sensitivity in Cancer (GDSC) project. For a total of 648 shared cell lines, we considered 48,270 gene alterations from CCLE as input features and the area under the dose-response curve (AUC) for 265 drugs from GDSC as the outcomes. A three-step reduction t…
Performance Dissimilarities in European Union Manufacturing: The Effect of Ownership and Technological Intensity
2021
Our paper addresses the relevance of a set of continuous and categorical variables that describe industry characteristics to differences in performance between foreign versus locally owned companies in industries with dissimilar levels of technological intensity. Including data on manufacturing sector performance from 20 European Union member countries and covering the 2009–2016 period, we used the random forests methodology to identify the best predictors of EU manufacturing industries’ a priori classification based on two main attributes: ownership (foreign versus local) and technological intensity. We found that EU foreign-owned businesses dominate locally owned ones in terms of size, wh…
Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values
2017
In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of eac…